Script Identification from Printed Document Images Using Statistical Features

نویسنده

  • M. M. Kodabagi
چکیده

Automatic identification of a script in a document image facilitates many important applications such as automatic archiving of multilingual documents; searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this work a technique for script identification from document images is proposed. The method uses vertical and horizontal run components/objects of words of a single line of text to distinguish 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments words from the selected line of text from a document image. Then statistics of horizontal and vertical run objects are determined. Further, linear discriminant function is used to identify script of the document image as Kannada, Hindi or English script. The method has been tested for 300 document images and the method found to be robust and efficient. The proposed system achieves 93% identification accuracy for Hindi script, 90% identification accuracy for English script and 86% identification accuracy for Kannada script.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Approach for Script Identification using Wavelet Packet Based Features

In a multi script environment, an archive of documents having the text regions printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documen...

متن کامل

Handwritten Script Identification from a Bi-Script Document at Line Level using Gabor Filters

In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a Gabor feature based approach is presented to identify different Indian scripts from handwritten document images. Eight popular In...

متن کامل

Wavelet Packet Based Texture Features for Automatic Script Identification

In a multi script environment, an archive of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documents printed in ten Indian scripts ...

متن کامل

Entropy Based Texture Features Useful for Automatic Script Identification

In a multi script environment, a collection of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition, it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the documents printed in three prioritized scripts Kannada, Hi...

متن کامل

Script Identification of Text Words from a Tri Lingual Document Using Voting Technique

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, H...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013